This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.
library(tidyverse)
CHR2020=read_csv("/Users/yangdi/Desktop/import/2020CHR.csv")
New names:
• `95% CI - Low` -> `95% CI - Low...5`
• `95% CI - High` -> `95% CI - High...6`
• `# Deaths` -> `# Deaths...22`
• `95% CI - Low` -> `95% CI - Low...24`
• `95% CI - High` -> `95% CI - High...25`
• `# Deaths` -> `# Deaths...41`
• `95% CI - Low` -> `95% CI - Low...43`
• `95% CI - High` -> `95% CI - High...44`
• `# Deaths` -> `# Deaths...60`
• `95% CI - Low` -> `95% CI - Low...62`
• `95% CI - High` -> `95% CI - High...63`
• `95% CI - Low` -> `95% CI - Low...80`
• `95% CI - High` -> `95% CI - High...81`
• `95% CI - Low` -> `95% CI - Low...83`
• `95% CI - High` -> `95% CI - High...84`
• `95% CI - Low` -> `95% CI - Low...86`
• `95% CI - High` -> `95% CI - High...87`
• `95% CI - Low` -> `95% CI - Low...96`
• `95% CI - High` -> `95% CI - High...97`
• `95% CI - Low` -> `95% CI - Low...115`
• `95% CI - High` -> `95% CI - High...116`
• `95% CI - Low` -> `95% CI - Low...133`
• `95% CI - High` -> `95% CI - High...134`
• `# Uninsured` -> `# Uninsured...135`
• `% Uninsured` -> `% Uninsured...136`
• `95% CI - Low` -> `95% CI - Low...137`
• `95% CI - High` -> `95% CI - High...138`
• `# Uninsured` -> `# Uninsured...139`
• `% Uninsured` -> `% Uninsured...140`
• `95% CI - Low` -> `95% CI - Low...141`
• `95% CI - High` -> `95% CI - High...142`
• `95% CI - Low` -> `95% CI - Low...146`
• `95% CI - High` -> `95% CI - High...147`
• `Average Grade Performance` -> `Average Grade Performance...148`
• `Average Grade Performance (Asian)` -> `Average Grade Performance (Asian)...149`
• `Average Grade Performance (Black)` -> `Average Grade Performance (Black)...150`
• `Average Grade Performance (Hispanic)` -> `Average Grade Performance (Hispanic)...151`
• `Average Grade Performance (White)` -> `Average Grade Performance (White)...152`
• `Average Grade Performance` -> `Average Grade Performance...153`
• `Average Grade Performance (Asian)` -> `Average Grade Performance (Asian)...154`
• `Average Grade Performance (Black)` -> `Average Grade Performance (Black)...155`
• `Average Grade Performance (Hispanic)` -> `Average Grade Performance (Hispanic)...156`
• `Average Grade Performance (White)` -> `Average Grade Performance (White)...157`
• `95% CI - Low` -> `95% CI - Low...159`
• `95% CI - High` -> `95% CI - High...160`
• `95% CI - Low` -> `95% CI - Low...180`
• `95% CI - High` -> `95% CI - High...181`
• `# Deaths` -> `# Deaths...197`
• `95% CI - Low` -> `95% CI - Low...199`
• `95% CI - High` -> `95% CI - High...200`
• `95% CI - Low` -> `95% CI - Low...219`
• `95% CI - High` -> `95% CI - High...220`
• `95% CI - Low` -> `95% CI - Low...243`
• `95% CI - High` -> `95% CI - High...244`
• `95% CI - Low` -> `95% CI - Low...247`
• `95% CI - High` -> `95% CI - High...248`
• `95% CI - Low` -> `95% CI - Low...266`
• `95% CI - High` -> `95% CI - High...267`
Rows: 3193 Columns: 270
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (5): State, County, Other Primary Care Provider Ratio, Non-Petitioned Cases, Petitioned Cases
dbl (265): FIPS, Life Expectancy, 95% CI - Low...5, 95% CI - High...6, Life Expectancy (AIAN), Life Expectancy (AIAN) 95%...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# it's already tidy!
view(CHR2020)
CHR2021=read_csv("/Users/yangdi/Desktop/import/2021CHR.csv")
New names:
• `95% CI - Low` -> `95% CI - Low...5`
• `95% CI - High` -> `95% CI - High...6`
• `# Deaths` -> `# Deaths...22`
• `95% CI - Low` -> `95% CI - Low...24`
• `95% CI - High` -> `95% CI - High...25`
• `# Deaths` -> `# Deaths...41`
• `95% CI - Low` -> `95% CI - Low...43`
• `95% CI - High` -> `95% CI - High...44`
• `# Deaths` -> `# Deaths...60`
• `95% CI - Low` -> `95% CI - Low...62`
• `95% CI - High` -> `95% CI - High...63`
• `95% CI - Low` -> `95% CI - Low...80`
• `95% CI - High` -> `95% CI - High...81`
• `95% CI - Low` -> `95% CI - Low...83`
• `95% CI - High` -> `95% CI - High...84`
• `95% CI - Low` -> `95% CI - Low...86`
• `95% CI - High` -> `95% CI - High...87`
• `95% CI - Low` -> `95% CI - Low...96`
• `95% CI - High` -> `95% CI - High...97`
• `95% CI - Low` -> `95% CI - Low...115`
• `95% CI - High` -> `95% CI - High...116`
• `95% CI - Low` -> `95% CI - Low...133`
• `95% CI - High` -> `95% CI - High...134`
• `# Uninsured` -> `# Uninsured...135`
• `% Uninsured` -> `% Uninsured...136`
• `95% CI - Low` -> `95% CI - Low...137`
• `95% CI - High` -> `95% CI - High...138`
• `# Uninsured` -> `# Uninsured...139`
• `% Uninsured` -> `% Uninsured...140`
• `95% CI - Low` -> `95% CI - Low...141`
• `95% CI - High` -> `95% CI - High...142`
• `95% CI - Low` -> `95% CI - Low...148`
• `95% CI - High` -> `95% CI - High...149`
• `Average Grade Performance` -> `Average Grade Performance...150`
• `Average Grade Performance (Asian)` -> `Average Grade Performance (Asian)...151`
• `Average Grade Performance (Black)` -> `Average Grade Performance (Black)...152`
• `Average Grade Performance (Hispanic)` -> `Average Grade Performance (Hispanic)...153`
• `Average Grade Performance (White)` -> `Average Grade Performance (White)...154`
• `Average Grade Performance` -> `Average Grade Performance...155`
• `Average Grade Performance (Asian)` -> `Average Grade Performance (Asian)...156`
• `Average Grade Performance (Black)` -> `Average Grade Performance (Black)...157`
• `Average Grade Performance (Hispanic)` -> `Average Grade Performance (Hispanic)...158`
• `Average Grade Performance (White)` -> `Average Grade Performance (White)...159`
• `95% CI - Low` -> `95% CI - Low...161`
• `95% CI - High` -> `95% CI - High...162`
• `95% CI - Low` -> `95% CI - Low...182`
• `95% CI - High` -> `95% CI - High...183`
• `# Deaths` -> `# Deaths...199`
• `95% CI - Low` -> `95% CI - Low...201`
• `95% CI - High` -> `95% CI - High...202`
• `95% CI - Low` -> `95% CI - Low...221`
• `95% CI - High` -> `95% CI - High...222`
• `95% CI - Low` -> `95% CI - Low...245`
• `95% CI - High` -> `95% CI - High...246`
• `95% CI - Low` -> `95% CI - Low...249`
• `95% CI - High` -> `95% CI - High...250`
• `95% CI - Low` -> `95% CI - Low...253`
• `95% CI - High` -> `95% CI - High...254`
• `95% CI - Low` -> `95% CI - Low...272`
• `95% CI - High` -> `95% CI - High...273`
• `` -> `...277`
Rows: 3193 Columns: 277
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (6): FIPS, State, County, Other Primary Care Provider Ratio, Non-Petitioned Cases, Petitioned Cases
dbl (270): Life Expectancy, 95% CI - Low...5, 95% CI - High...6, Life Expectancy (AIAN), Life Expectancy (AIAN) 95% CI - ...
lgl (1): ...277
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
CHR2020$FIPS=as.character(CHR2020$FIPS)
CHR=bind_rows(CHR2020, CHR2021)
#how to find unmatched columns? how to ignore them and continue?
library(tidyverse)
library(naniar)
rm(list=ls())
longdata = read_csv("/Users/yangdi/Desktop/import/chr_trends_csv_2021.csv")
Rows: 657649 Columns: 15
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (6): yearspan, measurename, statecode, countycode, county, state
dbl (6): numerator, denominator, measureid, chrreleaseyear, differflag, trendbreak
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#view(longdata)
unique(longdata$yearspan)
[1] "1997-1999" "1998-2000" "1999-2001" "2000-2002" "2001-2003" "2002-2004" "2003-2005" "2004-2006" "2005-2007" "2006-2008"
[11] "2007-2009" "2008-2010" "2009-2011" "2010-2012" "2011-2013" "2012-2014" "2013-2015" "2014-2016" "2015-2017" "2016-2018"
[21] "2017-2019" "2008" "2009" "2010" "2011" "2012" "2013" "2014" "2015" "2016"
[31] "2017" "2018" "2002" "2003" "2004" "2005" "2006" "2007" "2019"
yearindex=unique(longdata$yearspan)
test1=longdata %>% filter(yearspan %in% yearindex[1:21])
unique(test1$yearspan)
[1] "1997-1999" "1998-2000" "1999-2001" "2000-2002" "2001-2003" "2002-2004" "2003-2005" "2004-2006" "2005-2007" "2006-2008"
[11] "2007-2009" "2008-2010" "2009-2011" "2010-2012" "2011-2013" "2012-2014" "2013-2015" "2014-2016" "2015-2017" "2016-2018"
[21] "2017-2019"
#create another for 22 til end
test2=longdata %>% filter(yearspan %in% yearindex[22:length(yearindex)])
unique(test2$yearspan)
[1] "2008" "2009" "2010" "2011" "2012" "2013" "2014" "2015" "2016" "2017" "2018" "2002" "2003" "2004" "2005" "2006" "2007"
[18] "2019"
#how do i select a lot of variables? variable.names(childpov), longdata[,1:9]
widedata = pivot_wider(test2, names_from = c(measurename), values_from = c(numerator, denominator, rawvalue, cilow, cihigh, measureid, chrreleaseyear, differflag, trendbreak))
#dim(widedata)
#head(widedata)
table(longdata$measurename)
Adult obesity Air pollution - particulate matter Alcohol-impaired driving deaths
44680 47910 38308
Children in poverty Dentists Flu vaccinations
57440 31940 22358
Mammography screening Physical inactivity Premature death
22358 44680 67010
Preventable hospital stays Primary care physicians Sexually transmitted infections
22358 28746 38298
Unemployment rate Uninsured Uninsured adults
57440 35113 35132
Uninsured children Violent crime rate
35132 28746
childpov=subset(longdata, measurename=='Children in poverty')
table(childpov$yearspan)
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
3190 3190 3190 3190 3190 3190 3190 3190 3190 3190 3190 3190 3190 3194 3194 3194 3194 3194
measures=unique(longdata$measurename)
#test3=childpov %>% order(yearspan)
test3=childpov %>% arrange(statecode, countycode, yearspan)
view(childpov)
dim(na.omit(childpov))
[1] 0 15
missplot=gg_miss_var(childpov)
view(missplot$data)
pct_miss(childpov)
[1] 22.2312
n_complete(childpov)
[1] 670056
measures = unique(longdata$measurename)
pct_miss_measure = numeric()
n_complete_measure = numeric()
missplotlist=list()
missplottable=list()
for (i in 1:length(measures)) {
val = measures[i]
measure = subset(longdata, measurename == val)
table(measure$yearspan)
#test3=measure %>% order(yearspan)
#test3=measure %>% arrange(statecode, countycode, yearspan)
#view(measure)
#dim(na.omit(measure))
missplotlist[[i]] = gg_miss_var(measure)
missplottable[[i]]=missplot$data
pct_miss_measure[i] = pct_miss(measure)
n_complete_measure[i] = n_complete(measure)
}
res_t = tibble(measures, pct_miss_measure, n_complete_measure)
library(VIM)
Loading required package: colorspace
Loading required package: grid
Registered S3 method overwritten by 'data.table':
method from
print.data.table
VIM is ready to use.
Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
Attaching package: ‘VIM’
The following object is masked from ‘package:datasets’:
sleep
library(FactoMineR)
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
library(missMDA)
library(naniar)
dim(na.omit(widedata))
[1] 0 131
missplot=gg_miss_var(widedata)
view(missplot$data)
pct_miss(widedata)
[1] 64.11781
n_complete(widedata)
[1] 2719466
#summarise(longdata, yearspan)
longdata = read_csv("/Users/yangdi/Desktop/import/chr_trends_csv_2021.csv")
Rows: 657649 Columns: 15
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (6): yearspan, measurename, statecode, countycode, county, state
dbl (6): numerator, denominator, measureid, chrreleaseyear, differflag, trendbreak
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
longdata=separate(longdata, yearspan, into=c("startyear", "endyear"), sep="-")
Warning: Expected 2 pieces. Missing pieces filled with `NA` in 501279 rows [67011, 67012, 67013, 67014, 67015, 67016, 67017, 67018, 67019, 67020, 67021, 67022, 67023, 67024, 67025, 67026, 67027, 67028, 67029, 67030, ...].
is.integer(longdata$startyear)
[1] FALSE
#how to covert char into integer?
longdata$startyear=as.integer(longdata$startyear)
longdata$endyear=as.integer(longdata$endyear)
#the results is crazily long. Can I have one answer?
all(longdata$startyear==longdata$endyear-2,na.rm = TRUE)
[1] TRUE
library(Hmisc)
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Attaching package: ‘Hmisc’
The following objects are masked from ‘package:dplyr’:
src, summarize
The following objects are masked from ‘package:base’:
format.pval, units
analytic2=read_csv(file = "/Users/yangdi/Desktop/import/analytic_data2021.csv")
Rows: 3195 Columns: 690
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (690): State FIPS Code, County FIPS Code, 5-digit FIPS Code, State Abbreviation, Name, Release Year, County Ranked (Y...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
view(analytic2)
firstrow=read_csv(file = "/Users/yangdi/Desktop/import/analytic_data2021.csv", n_max = 1)
Rows: 1 Columns: 690
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (690): State FIPS Code, County FIPS Code, 5-digit FIPS Code, State Abbreviation, Name, Release Year, County Ranked (Y...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
view(firstrow)
varlabel=colnames(firstrow)
#view(varlabel)
analytic=read_csv(file = "/Users/yangdi/Desktop/import/analytic_data2021.csv", skip = 1)
Rows: 3194 Columns: 690
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (5): statecode, countycode, fipscode, state, county
dbl (559): year, county_ranked, v001_rawvalue, v001_numerator, v001_denominator, v001_cilow, v001_cihigh, v001_flag, v001...
lgl (126): v002_numerator, v002_denominator, v036_numerator, v036_denominator, v042_numerator, v042_denominator, v009_num...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(viridis)
library(tidyverse)
library(usmap)
library(ggplot2)
library(plotly)
library(rjson)
childpov$fips = paste(childpov$statecode, childpov$countycode,sep="")
us_states = map_data("state")
us_counties = map_data("county")
plot_usmap(data=childpov, values="rawvalue")
#can't combine usmap with time-varying animation?
url = 'https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json'
counties = rjson::fromJSON(file=url)
g <- list(
scope = 'usa',
projection = list(type = 'albers usa'),
showlakes = TRUE,
lakecolor = toRGB('white')
)
childpov$yearspan=as.integer(childpov$yearspan)
plotmap1 = plot_ly(geojson=counties, locations=childpov$fips, z=childpov$rawvalue, colorscale="Viridis", zmin=0, zmax=1, type='choropleth')
plotmap2=plotmap1 %>%
layout(title='Child poverty',
geo = g)
print(plotmap2)
Warning: Ignoring 15 observations
Warning: Ignoring 15 observations
NULL
%>% animation_opts( frame = 100, transition = 0, redraw = FALSE) %>% animation_slider( currentvalue=list(prefix=“yearspan”)) %>% animation_button( x = 1, xanchor = “right”, y = 0, yanchor = “bottom”) Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.